Energy and F0 contour modeling with functional data analysis for emotional speech detection
نویسندگان
چکیده
This paper proposes the use of reference models to detect emotional prominence in the energy and F0 contours. The proposed framework aims to model the intrinsic variability of these prosodic features. We present a novel approach based on Functional Data Analysis (FDA) to build reference models using a family of energy and F0 contours, which are implemented with lexicon-independent models. The neutral models are represented by bases of functions and the testing energy and F0 contours are characterized by their projections onto the corresponding bases. The proposed system can lead to accuracies as high as 80.4% in binary emotion classification in the EMODB corpus, which is 17.6% higher than the one achieved by a benchmark classifier trained with sentence level prosodic features. The approach is also evaluated with the SEMAINE corpus, showing that it can be effectively used in real applications.
منابع مشابه
Shape-based modeling of the fundamental frequency contour for emotion detection in speech
This paper proposes the use of neutral reference models to detect local emotional prominence in the fundamental frequency. A novel approach based on functional data analysis (FDA) is presented, which aims to capture the intrinsic variability of F0 contours. The neutral models are represented by a basis of functions and the testing F0 contour is characterized by the projections onto that basis. ...
متن کاملDeep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion
Emotional voice conversion aims at converting speech from one emotion state to another. This paper proposes to model timbre and prosody features using a deep bidirectional long shortterm memory (DBLSTM) for emotional voice conversion. A continuous wavelet transform (CWT) representation of fundamental frequency (F0) and energy contour are used for prosody modeling. Specifically, we use CWT to de...
متن کاملDiscriminating Neutral and Emotional Speech using Neural Networks
In this paper, we address the issue of speaker-specific emotion detection (neutral vs emotion) from speech signals with models for neutral speech as reference. As emotional speech is produced by the human speech production mechanism, the emotion information is expected to lie in the features of both excitation source and the vocal tract system. Linear Prediction residual is used as the excitati...
متن کاملCorpus-based Synthesis of F0 Conto Using the Generation P
A corpus-based generation of fundamental frequency (F0) contours was realized for emotional speech synthesis. The method, originally developed for read speech, is to predict command values of the F0 contour generation process model with the input of linguistic information of the sentence to be synthesized. Since the generated F0 contour is under the model constraint, a certain quality is still ...
متن کاملWord segmentation in Persian continuous speech using F0 contour
Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013